Geostatistical modelling

Geostatistical Data

In many ecological and environmental settings, measurements are taken from fixed sampling units, aiming to quantify spatial variation and interpolate values at unobserved sites.

  • Geostatistical data are the most common form of spatial data found in environmental and ecological settings.
  • We regularly take measurements of an environmental variable of interest at a set of fixed locations.
  • This could be data from samples taken across a region (eg., water depth in a lake) or from monitoring stations as part of a network (e.g., air pollution).
  • In each of these cases, our goal is to estimate the value of our variable across the entire space.

Understanding our region

  • Let \(D\) be our two-dimensional region of interest.
  • In principle, there are infinitely many locations within \(D\), each of which can be represented by mathematical coordinates (e.g. latitude and longitude).
  • We can identify any individual location as \(\mathbf{s}_i = (x_i, y_i)\), where \(x_i\) and \(y_i\) are their coordinates.
  • We can treat our variable of interest as a random variable, \(Z\) which can be observed at any location as \(Z(\mathbf{s}_i)\).

Geostatistical process

  • A geostatistical process can therefore be written as: \[\{Z(\mathbf{s}); \mathbf{s} \in D\}\]
  • In practice, data are observed at a finite number of locations, \(m\), and can be denoted as: \[z = \{z(\mathbf{s}_1), \ldots z(\mathbf{s}_m) \}\]
  • We have observed our data at \(m\) locations, but often want to predict this process at a set of unknown locations.
  • For example, what is the value of \(z(\mathbf{s}_0)\), where \(\mathbf{s}_0\) is an unobserved site?

Spatial Dependence

Spatial autocorrelation

  • The key challenge in modelling geostatistical data is understanding correlation.
  • Typically observations close together in space will be more similar than those which are further apart.
  • Spatial correlation is usually driven by some unmeasured confounding variable(s) - for example, air pollution is spatially correlated because nearby areas tend to experience similar traffic levels.
  • It is important that we account for these correlations in our analysis - failing to do so will lead to poor inference.

Modelling geostatistical data

For a set of geostatistical data \(\mathbf{z} = \{ z(\mathbf{s}_1), \ldots, z(\mathbf{s}_m) \}\), we can consider the general model:

\[Z(\mathbf{s}_i) = \mu(\mathbf{s}_i) + e(\mathbf{s}_i)\]

  • \(\mu(\mathbf{s}_i)\) is a mean function which models trend and covariate effects.

  • Then \(e(\mathbf{s}_i)\) is the error process which accounts for any spatial correlation which exists after accounting for \(\mu(\mathbf{s}_i)\)

  • Spatial statistics is therefore often focused on understanding the process for \(e(\mathbf{s}_i)\).

Our key problem(s)

  • We have observations at \(m\) locations \[\mathbf{z} = \{ z(\mathbf{s}_1), \ldots, z(\mathbf{s}_m) \}.\]
  • We want to use these to obtain an estimate of \(Z(\mathbf{s}_0)\) where \(\mathbf{s}_0\) is an unobserved location.
  • How do we model the spatial dependence between our observed sites \(\mathbf{s}_1, \ldots, \mathbf{s}_m\)?
  • What does this tell us about the dependence between our observed sites and our unobserved site \(\mathbf{s}_0\)?

Variograms

The first step is to assess whether there is any evidence of spatial dependency in our data.

  • Spatial dependence in georeferenced data can be explored by a function known as a variogram \(2\gamma(\cdot)\) (or semivariogram \(\gamma(\cdot)\)).

  • The variogram is similar in many ways to the autocorrelation function used in time series modelling.

  • In simple terms, it is a function which measures the difference in the spatial process between a pair of locations a fixed distance apart.

Variograms

The variogram measures the variance of the difference in the process \(Z(\cdot)\) at two spatial locations \(\mathbf{s}\) and \(\mathbf{s+h}\) and is defined as :

\[\mathrm{Var}[Z(\mathbf{s}) - Z(\mathbf{s} + \mathbf{h})] = E[(Z(\mathbf{s}) - Z(\mathbf{s} + \mathbf{h}))^2] = 2\gamma_z(\mathbf{h}).\]

Note that in practice we use the semi-variogram \(\gamma_z(\mathbf{h})\) because our points come in pairs, and the semi-variance is equivalent to the variance per point at a given lag.

  • When the variance of the difference \(Z(\mathbf{s}) - Z(\mathbf{s} + \mathbf{h})\) is relatively small, then \(Z(\mathbf{s})\) and \(Z(\mathbf{s} + \mathbf{h})\) are similar (spatially correlated).

  • When the variance of the difference \(Z(\mathbf{s}) - Z(\mathbf{s} + \mathbf{h})\) is relatively large, then \(Z(\mathbf{s})\) and \(Z(\mathbf{s} + \mathbf{h})\) are less similar (closer to independence).

Variograms

A plot of the empirical semivariogram against the separation distance conveys important information about the continuity and spatial variability of the process.

  • The sill is the maximum variance as \(h \to \infty\).
  • The partial sill represents the spatially structured variability
  • The nugget is the minimum variance as \(h \to 0\) and represents the variability at distances smaller than sampling interval.
  • The range is the distance to the sill.
  • Points further apart than the range are assumed to be uncorrelated.

Example: Construction a Variogram

In practice, we only have access to \(m\) realisations of this process, and therefore we have to estimate the variogram. This is known as the empirical variogram.

The empirical semivariogram can be used as exploratory tool to assess whether data present spatial correlation.

We obtain this by computing the semi-variance for all possible pairs of observations: \(\gamma(\mathbf{s}, \mathbf{s} + \mathbf{h}) = 0.5(Z(\mathbf{s}) - Z(\mathbf{s} + \mathbf{h}))^2\).

Example: Constructing a Variogram

The data Paraná from the geoR Package contains the average rainfall over different years for the period May to June at 123 monitoring stations in Paraná state, Brazil.

Rainfall values measured at 143 recording stations in Paraná state, Brazil with low values being represented in blue and high values in red.

Example: Construction a Variogram

To illustrate how an empirical variogram is computed, consider the two highlighted locations below.

Example: Construction a Variogram

To illustrate how an empirical variogram is computed, consider the two highlighted locations below.

  1. We can first compute the distance between the two locations using the Euclidean distance formula \[h = \sqrt{(475.1 - 403)^2 + (83.6 - 164.5)^2} = 108.36\]

  2. Next, we compute the semi-variance between the points using their observed values as \[\begin{aligned}\gamma(\mathbf{s}, \mathbf{s}+\mathbf{h}) &= 0.5(Z(\mathbf{s}) - Z(\mathbf{s}+\mathbf{h}))^2 \\ &= 0.5(315.33 - 306.9)^2 = 35.53\end{aligned}\]

  3. We repeat this process for every possible pair of points, and plot \(h\) against \(\gamma(\mathbf{s}, \mathbf{t})\) for each.

Example: Construction a Variogram

We can calculate the empirical variogram for the data using the variogram function from the gstat library.

Empirical variogram values corresponding to the rainfall data in Paraná state, Brazil.
  • This plot shows the semi-variances for each pair of points.

  • Each pair of points has a different distance, making it difficult to use this for prediction.

Binned variogram

To make the variogram easier to use and interpret, we divide the distances into a set of discrete bins, and compute the average semi-variance in each.

  • We compute this binned empirical variogram as \[\gamma(\mathbf{h}) = \frac{1}{2N(h_k)}\sum_{(\mathbf{s},\mathbf{t}) \in N(h_k)}[z(\mathbf{s}) - z(\mathbf{s}+\mathbf{h})]^2\]
  • Here, \(k\) is the number of bins and \(N(h_k)\) is the number of points in the bin with average distance \(h\).
  • We then construct a plot of our empirical variogram and use this to estimate the covariance structure.

Binned variogram

The bins are illustrated on the left, and the empirical variogram obtained from them is shown on the right.

Do we observe any spatial dependence?

We can construct null envelope based on permutations of the data values across the locations, i.e. envelopes built under the assumption of no spatial correlation.

  • By overlapping these envelopes with the empirical variograms we can determine whether there is some spatial dependence in our data

  • We can construct permutation envelopes on the empirical variogram using the envelope function from the variosig R package.

  • In this example, we observe that the variogram only falls outside of the null envelope at distances \(<200\)m and also at distances above \(300\)m.

  • Once we have computed an empirical variogram, we have to think about model fitting.

[1] "There are 13 out of 15 variogram estimates outside the 95% envelope."

Spatial Modelling

What We Observed and What We Did Not

We treat the observed process of interest as being measured with error

\[ (\text{observed value})_i = (\text{true value at location } i) + (\text{error})_i \]

alternatively

\[ y_i = Z(\mathbf{s}_i) + \varepsilon_i \]

When geostatistical data are considered, we can often assume that there is a spatially continuous variable underlying the observations that can be modeled using a random field.

  • we have a process that is occurring everywhere in space \(\rightarrow\) natural to try to model it using some sort of function (of space)

  • a random field is a random function that generates smooth surfaces

  • This is hard

  • We typically make our lives easier by making everything Gaussian

Gaussian Random Fields

A Gaussian random field (GRF) is a collection of random variables, where observations occur in a continuous domain, and where every finite collection of random variables has a multivariate normal distribution

\[ \mathbf{z} = (z(\mathbf{s}_1),\ldots,z(\mathbf{s}_m)) \sim N(\mu(\mathbf{s}_1),\ldots,\mu(\mathbf{s}_m),\Sigma), \]

where \(\Sigma_{ij} = \mathrm{Cov}(z(\mathbf{s}_i),z(\mathbf{s}_j))\) is a dense \(m \times m\) matrix.

  • This is actually quite tricky: \(\Sigma\) will need to depend on the set of observation sites and has to behave well (be “positive definite”)
  • Use a covariance function \(C_z(\cdot,\cdot)\) that depends on the distance (\(\Sigma_{ij} = C_z(\mathbf{s}_i,\mathbf{s}_j)\)) between two points and that
    • has no negative variances
    • is symmetric
    • is decreasing, with maximum at distance = 0
  • \(C_z(\mathbf{s}_i, \mathbf{s}_j)\) measures the strength of the linear dependence between \(Z(\mathbf{s}_i)\) and \(Z(\mathbf{s}_j)\).
  • \(C_z(\mathbf{s}_i, \mathbf{s}_j) = \mathrm{Var}(Z(\mathbf{s}_i))\) for \(i = j\).

Stationary and isotropy

We will assume our Gaussian process can be described as weakly stationary if the following criteria are met:

\(E[{Z(\mathbf{s})}] = \mu_z(\mathbf{s}) = \mu_z\) - a finite constant which does not depend on \(\mathbf{s}\).

\(\text{Cov}(Z(\mathbf{s}_i),Z(\mathbf{s}_j)) = C_z(\mathbf{s}_i-\mathbf{s}_j)\) - a finite constant which can depend on distance \((\mathbf{s}_i-\mathbf{s}_j)\).

  • Condition 1 states that our mean function must be constant in space, with no overall spatial trend.

  • Condition 2 states that for any two locations, their covariance depends only on how far apart they are (their spatial lag, \(h\)), not their absolute position.

Our process is said to be isotropic if the covariance function is directionally invariant. This means that the covariance only depends on the euclidean distance \((||\mathbf{s}_i-\mathbf{s}_j||)\) and not the direction.

Building the model

The first step in defining a model for a random field in a hierarchical framework is to identify a probability distribution for the observations available at \(m\) sampled spatial locations and represented by the vector \(\mathbf{y} = y_1,\ldots,y_m\).

For example, if we assume our observations follow a Gaussian distribution then

\[ \begin{aligned} Y_i &\sim N(\mu_i,\tau_e^{-1})\\ \eta_i &=\mu_i = \beta_0 + \ldots + Z(\mathbf{s}_i) \end{aligned} \]

  • \(\tau_e^{-1} = \sigma^2_e\) represents the variance of the zero-mean measurement error (equivalent to the nugget effect)

  • The response mean \(\mu_i\) which coincides with the linear predictor \(\eta_i\) is defined based on:

    • the intercept \(\beta_0\) and any additional covariates

    • the realization of the latent (unobservable) GF \(Z(\mathbf{s}) \sim \mathrm{MVN}(0,\Sigma)\) which accounts for the spatial correlation through \(\Sigma = C_z(\cdot,\cdot)\).

The Matérn Field

A commonly used covariance function is the Matérn covariance function. The covariance of two points which are a distance \(h\) apart is:

\[ \Sigma =C_{\nu}(h) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \left( \frac{\sqrt{2\nu} h}{\rho} \right)^{\nu} K_{\nu} \left( \frac{\sqrt{2\nu} h}{\rho} \right) \]

  • \(\Gamma(\cdot)\) is the gamma function

  • \(K_{\nu}(\cdot)\) is the modified Bessel function of the second kind.

  • Parameters \(\sigma^2\), \(\rho\) and \(\nu\) are non-negative values of the covariance function.

    • \(\sigma^2\) is the spatially structured variance component

    • \(\rho\) is the range of the spatial process

    • \(\nu\) controls smoothness of the spatial process.

Big n problem!

The disadvantage of the modelling approach involving the spatial covariance function is known as “big n problem” and concerns the computational costs required for algebra operations with dense covariance matrices (such as \(\Sigma\)).

In particular dense matrix operations scale cubically with the matrix size, given by the number of locations where the process is observed. A computationally effective alternative is given by the stochastic partial differential equation (SPDE) approach

The SPDE approach

We define a (Matérn) GRF as the solution of a stochastic partial differential equation (SPDE)

\[ (\kappa^2-\Delta)^{\alpha/2}Z(t) = W(t) \]

What is this?

  • \(W(t)\) is random noise
  • \(\omega(t)\) is the smooth process we want
  • \((\kappa^2-\Delta)^{\alpha/2}\) is an operator that “smooths” the white noise.
  • \(\kappa\) and \(\alpha\) are parameters

One analogy

Imagine a guitar string stretched from left to right.

  • Now imagine someone randomly taps along it at many locations:

    • Each tap is independent
    • Some taps are strong, some weak
    • There is no coordination between taps
    • A tap at one location tells you nothing about a tap nearby

This is pure randomness. But a real string does not behave like this

One analogy

Imagine a guitar string stretched from left to right.

  • The rope has tension and stiffness
    • the tension spreads each tap to nearby points
    • Sharp jumps are softened. \((\kappa^2-\Delta)^{\color{red}{\alpha}/2}Z(t) = W(t)\)
  • \(\Delta\) measures the local curvature
  • \(\kappa\) controls how far randomness propagates
  • stronger tension (large \(\kappa\)) = stronger pull

\[(\color{red}{\kappa^2}-\Delta)Z(t) = W(t), ~~~\text{for } \alpha=2\]

One analogy

Imagine a guitar string stretched from left to right.

  • The rope has tension and stiffness
    • the tension spreads each tap to nearby points
    • Sharp jumps are softened.
  • Stiffness = controls how smooth the field becomes (\(\alpha\))
  • Larger \(\alpha\) smoother the process will be

\[ (\kappa^2-\Delta)^{\color{red}{\alpha}/2}Z(t) = W(t) \]

Solving the SPDE

Ok…but we still need to solve the SPDE to find \(Z(t)\)!

Now we need to discretize the domain into T points (we cannot compute on the continuous!)

We represent our solution as

\[ Z(t) = \sum_{i = 1}^T\psi_i(t)w_i \]

Where

  • \(\psi_i(t)\) are (known) basis functions for nodes \(i=1,\ldots,T\)

    • \(\psi_i(t_i)= 1\)
    • \(\psi_i(t_j) = 0 ~~\forall~~i \neq j\)
    • Linear between neighboring nodes

Solving the SPDE

Ok…but we still need to solve the SPDE to find \(Z(t)\)!

Now we need to discretize the domain into T points (we cannot compute on the continuous!)

We represent our solution as

\[ Z(t) = \sum_{i = 1}^T\psi_i(t)w_i \]

Where

  • \(\psi_i(t)\) are (known) basis functions for nodes \(i=1,\ldots,T\)

  • \(w_i\) are (unknown) weights

    • the field value \(Z(s)\) is a linear interpolation between the two neighboring weights

Solving the SPDE

Ok…but we still need to solve the SPDE to find \(Z(t)\)!

Now we need to discretize the domain into T points (we cannot compute on the continuous!)

We represent our solution as

\[ Z(t) = \sum_{i = 1}^T\psi_i(t)w_i \]

Where

  • \(\psi_i(t)\) are (known) basis functions for nodes \(i=1,\ldots,T\)

  • \(w_i\) are (unknown) weights

  • This solution is then approximated using a finite combination of piece-wise linear basis functions.

  • The solution is completely defined by a Gaussian vector of weights with zero mean and a sparse precision matrix.

The SPDE approach on 2D

Now we approximate the GRF using a triangulated mesh.

The SPDE approach represents the continuous spatial process as a continuously indexed Gaussian Markov Random Field (GMRF)

  • We construct an appropriate lower-resolution approximation of the surface by sampling it in a set of well designed points and constructing a piece-wise linear interpolant.

The SPDE approach on 2D

Now we approximate the GRF using a triangulated mesh.

The SPDE approach represents the continuous spatial process as a continuously indexed Gaussian Markov Random Field (GMRF)

  • We construct an appropriate lower-resolution approximation of the surface by sampling it in a set of well designed points and constructing a piece-wise linear interpolant.

  • Note that \(\nu = \alpha - d/2\). For \(\alpha=2 \Rightarrow \nu= 1\) since \(d=2\) we have that:

Note

\[ \begin{aligned} Z(s) &= \sum_{i = 1}^K\psi_i(s)w_i \\ \mathbf{w} &\sim N(\mathbf{0},Q^{-1}) \leftarrow \text{GMRF}\\ Q^{-1} &= \tau^2(\kappa^4 \mathbf{C} + 2\kappa^2 \mathbf{G}+\mathbf{G}\mathbf{C}^{-1}\mathbf{G}) \end{aligned} \]

  • \(\mathbf{C}\) is diagonal with entries \(C_{ii} =\int \psi_i(s)\mathrm{d}s\) and measures how much of the domain each basis function covers.

  • \(G_{ij} = \int \nabla \psi_i(s) \nabla \psi_j(s) \mathrm{d}s\) reflects the connectivity of the mesh nodes.

  • because each basis function overlaps only with nearby ones, the resulting precision matrix is sparse, meaning each coefficient depends directly only on its neighbors

In summary

  • The continuous Matérn GRF is the solution of a SPDE and is represented as

\[ Z(s) = \sum_{i = 1}^K\psi_i(s)w_i \]

  • The weights vector \(\mathbf{w} = (w_1,\dots,w_K)\) is Gaussian with a sparse precision matrix \(\longrightarrow\) Computational convenience

  • The field has two parameters

    • The range \(\rho\)
    • The marginal variance \(\sigma^2\)
  • These parameters are linked to the parameters of the SPDE

  • We need to assign prior to them

Penalized Complexity (PC) priors

Penalized Complexity (PC) priors proposed by Simpson et al. (2017) allow us to control the amount of spatial smoothing and avoid overfitting.

  • PC priors shrink the model towards a simpler baseline unless the data provide strong evidence for a more complex structure.
  • To define the prior for the marginal precision \(\sigma^{-2}\) and the range parameter \(\rho\), we use the probability statements:
    • Define the prior for the range \(\text{Prob}(\rho<\rho_0) = p_{\rho}\)
    • Define the prior for the range \(\text{Prob}(\sigma>\sigma_0) = p_{\sigma}\)

Learning about the SPDE approach

  • F. Lindgren, H. Rue, and J. Lindström. An explicit link between Gaussian fields and Gaussian Markov random fields: The SPDE approach (with discussion). In: Journal of the Royal Statistical Society, Series B 73.4 (2011), pp. 423–498.

  • H. Bakka, H. Rue, G. A. Fuglstad, A. Riebler, D. Bolin, J. Illian, E. Krainski, D. Simpson, and F. Lindgren. Spatial modelling with R-INLA: A review. In: WIREs Computational Statistics 10:e1443.6 (2018). (Invited extended review). DOI: 10.1002/wics.1443.

  • E. T. Krainski, V. Gómez-Rubio, H. Bakka, A. Lenzi, D. Castro-Camilio, D. Simpson, F. Lindgren, and H. Rue. Advanced Spatial Modeling with Stochastic Partial Differential Equations using R and INLA. Github version . CRC press, Dec. 20

Example: Modelling Rainfall in Brazil

Lets revisit the Paraná data containing the average rainfall over different years for the period May to June at 123 monitoring stations in Paraná state, Brazil.

Rainfall values measured at 143 recording stations in Paraná state, Brazil with low values being represented in blue and high values in red.

The Model

  • Stage 1 Model for the response \[ y(s)|\eta(s)\sim\text{Normal}(\mu(s),\sigma^2_e) \]

  • Stage 2 Latent field model \[ \eta(s) = \mu(s) = \beta_0 + Z(s) \]

  • Stage 3 Hyperparameters

The Model

  • Stage 1 Model for the response \[ y(s)|\eta(s)\sim\text{Normal}(\mu(s),\sigma^2_e) \]

  • Stage 2 Latent field model \[ \eta(s) = \mu(s) = \beta_0 + Z(s) \]

    • A global intercept \(\beta_0\)
    • A Gaussian field \(Z(s)\)
  • Stage 3 Hyperparameters

The Model

  • Stage 1 Model for the response \[ y(s)|\eta(s)\sim\text{Normal}(\mu(s),\sigma^2_e) \]

  • Stage 2 Latent field model \[ \eta(s) = \mu(s) = \beta_0 + Z(s) \]

  • Stage 3 Hyperparameters

    • Precision for the observational error \(\tau_e = 1/\sigma^{2}_e\)
    • Range and sd in the Gaussian field \(\sigma_{\omega}, \tau_{\omega}\)

Step 1: Define the SPDE representation: The mesh

First, we need to create the mesh used to approximate the random field.

library(fmesher)
library(inlabru)
library(INLA)

mesh <- fm_mesh_2d(
  loc = parana_sf, 
  offset = c(50, 100),
  cutoff = 1,
  max.edge = c(30, 60)
)
  • max.edge for maximum triangle edge lengths
  • offset for inner and outer extensions (to prevent edge effects)
  • cutoff to avoid overly small triangles in clustered areas

Step 1: Define the SPDE representation: The mesh

  • All random field models need to be discretised for practical calculations.

  • The SPDE models were developed to provide a consistent model definition across a range of discretisations.

  • We use finite element methods with local, piecewise linear basis functions defined on a triangulation of a region of space containing the domain of interest.

  • Deviation from stationarity is generated near the boundary of the region.

  • The choice of region and choice of triangulation affects the numerical accuracy.

Step 1: Define the SPDE representation: The mesh

  • If the mesh is too fine \(\rightarrow\) heavy computation

  • If the mesh is to coarse \(\rightarrow\) not accurate enough

Step 1: Define the SPDE representation: The mesh

Some guidelines

  • Create triangulation meshes with fm_mesh_2d():

  • edge length should be around a third to a tenth of the spatial range

  • Move undesired boundary effects away from the domain of interest by extending to a smooth external boundary:

  • Use a coarser resolution in the extension to reduce computational cost (max.edge=c(inner, outer)), i.e., add extra, larger triangles around the border

Step 1: Define the SPDE representation: The mesh

  • Use a fine resolution (subject to available computational resources) for the domain of interest (inner correlation range) and avoid small edges ,i.e., filter out small input point clusters (0 \(<\) cutoff \(<\) inner)

  • Coastlines and similar can be added to the domain specification in fm_mesh_2d() through the boundary argument.

  • simplify the border

Step 2: Define the SPDE representation: The SPDE

We use the inla.spde2.pcmatern to define the SPDE model using PC priors through the following probability statements.

  • The Paraná state is around 663.8711 kilometers width by 464.7481 kilometers height.
  • The PC-prior for the practical range is built considering the probability of the practical range being less than a chosen distance.
  • We chose to set the prior considering the median as 100 kilometers.
  • \(P(\rho < 100) = 0.5\)

  • \(P(\sigma > 1) = 0.5\)

Fit the Model

The Model

\[ \begin{aligned} y(s)|\eta(s)&\sim\text{Normal}(\mu(s),\sigma^2_e)\\ \eta(s) & = \color{red}{\boxed{\beta_0}} + \color{red}{\boxed{ Z(s)}}\\ \end{aligned} \]

parana_sf %>% print(n = 3)
Simple feature collection with 143 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 150.122 ymin: 70.36 xmax: 768.5087 ymax: 461.9681
CRS:           NA
First 3 features:
   value                  geometry
1 306.09 POINT (402.9529 164.5284)
2 200.88  POINT (501.7049 428.771)
3 167.07 POINT (556.3262 445.2706)

The code

# define model component
cmp = ~ -1 + Intercept(1) + space(geometry, model = spde_model)

# define model predictor
eta = value ~ Intercept  + space

# build the observation model
lik = bru_obs(formula = eta,
              data = parana_sf,
              family = "gaussian")

# fit the model
fit = bru(cmp, lik)

Fit the Model

The Model

\[ \begin{aligned} y(s)|\eta(s)&\sim\text{Normal}(\mu(s),\sigma^2_e)\\ \color{red}{\boxed{\eta(s)}} & =\color{red}{\boxed{ \beta_0 + Z(s)}}\\ \end{aligned} \]

parana_sf %>% print(n = 3)
Simple feature collection with 143 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 150.122 ymin: 70.36 xmax: 768.5087 ymax: 461.9681
CRS:           NA
First 3 features:
   value                  geometry
1 306.09 POINT (402.9529 164.5284)
2 200.88  POINT (501.7049 428.771)
3 167.07 POINT (556.3262 445.2706)

The code

# define model component
cmp = ~ -1 + Intercept(1) + space(geometry, model = spde_model)

# define model predictor
eta = value ~ Intercept  + space

# build the observation model
lik = bru_obs(formula = eta,
              data = parana_sf,
              family = "gaussian")

# fit the model
fit = bru(cmp, lik)

Fit the Model

The Model

\[ \begin{aligned} \color{red}{\boxed{y(s)|\eta(s)}} &\sim\text{Normal}(\mu(s),\sigma^2_e)\\ \eta(s) & =\beta_0 + Z(s)\\ \end{aligned} \]

parana_sf %>% print(n = 3)
Simple feature collection with 143 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 150.122 ymin: 70.36 xmax: 768.5087 ymax: 461.9681
CRS:           NA
First 3 features:
   value                  geometry
1 306.09 POINT (402.9529 164.5284)
2 200.88  POINT (501.7049 428.771)
3 167.07 POINT (556.3262 445.2706)

The code

# define model component
cmp = ~ -1 + Intercept(1) + space(geometry, model = spde_model)

# define model predictor
eta = value ~ Intercept  + space

# build the observation model
lik = bru_obs(formula = eta,
              data = parana_sf,
              family = "gaussian")

# fit the model
fit = bru(cmp, lik)

Fit the Model

The Model

\[ \begin{aligned} y(s)|\eta(s) &\sim\text{Normal}(\mu(s),\sigma^2_e)\\ \eta(s) & =\beta_0 + Z(s)\\ \end{aligned} \]

parana_sf %>% print(n = 3)
Simple feature collection with 143 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 150.122 ymin: 70.36 xmax: 768.5087 ymax: 461.9681
CRS:           NA
First 3 features:
   value                  geometry
1 306.09 POINT (402.9529 164.5284)
2 200.88  POINT (501.7049 428.771)
3 167.07 POINT (556.3262 445.2706)

The code

# define model component
cmp = ~ -1 + Intercept(1) + space(geometry, model = spde_model)

# define model predictor
eta = value ~ Intercept  + space

# build the observation model
lik = bru_obs(formula = eta,
              data = parana_sf,
              family = "gaussian")

# fit the model
fit = bru(cmp, lik)

Results

Mean 2.5% 97.5%
Intercept 249.714 232.748 264.983
Precision for the Gaussian observations 4.482 3.197 5.511
Range for space 57.328 46.234 70.330
Stdev for space 46.654 41.222 52.736
  • Intercept represents the average rainfall values
  • Precision for the Gaussian observations are the observational errors
  • Range for space is the correlation of the Matérn field
  • Stdev for space is the marginal std deviation of the Matérn field

Spatial predictions

How do we predict at unsampled locations?

In geostatistical applications, the main interest resides in the spatial prediction of the spatial latent field or of the response variable in new locations

  • Suppose we observe a spatial process \({Z(s): s \in \mathcal{D}}\) at locations \(s_1,\dots,s_n\).

  • Our goal: predict the variable of interest at an unobserved location \(s_0 \in \mathcal{D}\).

    given the data \(y = (y_1,\dots,y_n)\), what can we say about \(Z(s_0)\)?

  • Rather than a single guess, we want a full uncertainty-aware prediction.

  • In a Bayesian setting, prediction is a probabilistic task.

Posterior predictive density

The key lies in the posterior predictive distribution

\[ \pi(\tilde{Y} \mid y) = \int \pi(\tilde{Y} \mid \Theta, y)\, \pi(\Theta \mid y)\, d\Theta, \] where \(\Theta\) denotes all latent components and hyperparameters.

  • The prediction likelihood \(\pi(\tilde{Y} \mid \Theta, y)\) depends on the task:
    • extrapolation(e.g. forecasting of an AR(1)): \(\pi(Y_{n+1} \mid \Theta, y_n)\),
    • interporlation: \(\pi(Y_i \mid y_{i-1}, y_{i+1}, \Theta)\),
  • Spatial prediction fits naturally into this framework:
    • \(\tilde{Y}\) may represent \(Z(s_0)\), \(\eta_0\), or the response at \(y(s_0)\),
    • conditioning reflects the assumed spatial dependence.
  • INLA approximates \(\pi(\Theta \mid y)\) efficiently, enabling full uncertainty propagation when predicting over \(s_0 \in \mathcal{D}\).

Posterior predictive density

Step 0 — Augment the data with prediction locations

  • Introduce locations where predictions are desired.

  • The response is set to NA at these locations.

  • Covariates and spatial coordinates are still provided.

Step 1 — Build the projector matrix A

Step 2 — Joint posterior inference

Step 3 — Posterior evaluation at prediction locations

dims = c(150, 150)
pred.df <- fm_pixels(mesh,dims = dims,mask =border,  format = "sf")

Posterior predictive density

Step 0 — Augment the data with prediction locations

Step 1 — Build the projector matrix A

  • A projector matrix \(A\) linking the latent Gaussian field to the prediction locations.

  • This ensures that the spatial effects and covariates at new locations are properly included in the model.

  • Linear predictors are computed at these new locations

Step 2 — Joint posterior inference

Step 3 — Posterior evaluation at prediction locations

pred <- predict(fit,pred.df,
                formula ~ data.frame(
    spde =  space,
    eta = Intercept + space
  )
)

Posterior predictive density

Step 0 — Augment the data with prediction locations

Step 1 — Build the projector matrix A

Step 2 — Joint posterior inference

  • INLA computes the posterior of:

    • the latent Gaussian field,

    • fixed effects,

    • hyperparameters.

Step 3 — Posterior evaluation at prediction locations

pred <- predict(fit,pred.df,
                formula ~ data.frame(
    spde =  space,
    eta = Intercept + space
  )
)

Posterior predictive density

Step 0 — Augment the data with prediction locations

Step 1 — Build the projector matrix A

Step 2 — Joint posterior inference

Step 3 — Posterior evaluation at prediction locations

  • Predictions come from the posterior marginals of the latent field and linear predictor.

  • Outputs include posterior means, variances, and credible intervals.

Example: Modelling Pacific Cod Biomass Density

In the next example, we will explore data on the Pacific Cod (Gadus macrocephalus) from a trawl survey in Queen Charlotte Sound.

  • The dataset the biomass density (kg/km\(^2\)) of Pacific cod in the area swept for a given survey in 2003 as well as depth covariate information.

Exploratory plots

Envelope Variogram considering only where biomass \(>\) 0

[1] "There are 2 out of 15 variogram estimates outside the 95% envelope."

Exploratory plots

Envelope Variogram considering only where biomass \(>\) 0

[1] "There are 2 out of 15 variogram estimates outside the 95% envelope."

We thus have a dilemma.

  • If we omit the zeros, we’ll get a good, accurate model fit for non-zero data, but we’ll be throwing away all the data with zeros
  • If we include the zeros, we won’t be throwing any data away, but we’ll get a strange-fitting model that both under- and over-predicts values.
  • So what do we do?

A non-spatial Model

  • Stage 1 Model for the response \[ y(s)|\eta(s)\sim \text{Tweedie}(p,\mu_i,\phi) \]

    • Tweedie account for positive continuous density values that also contain zeros

    • \(p\) determines the shape of the variance function (shifts from a Poisson distribution at \(p=1\) to a gamma distribution at \(p=2\))

    • \(\mu(s) = \exp ⁡\eta (i)\) is the mean linked to linear predictor by the log link.

    • \(\phi\) = dispersion parameter .

  • Stage 2 Latent field model \[ \eta(s) = \text{log}(\mu(s)) = \beta_0 + \beta_1 \text{depth} + \beta_2 \text{depth}^2 \]

  • Stage 3 Hyperparameters

A non-spatial Model

  • Stage 1 Model for the response\[ y(s)|\eta(s)\sim \text{Tweedie}(p,\mu_i,\phi) \]
  • Stage 2 Latent field model \[ \eta(s) = \exp(\mu(s)) = \beta_0 + \beta_1 x(s) + \beta_2 x(s)^2 \]
    • A global intercept \(\beta_0\)
    • A effect of covariate \(x(s)\) (depth)
    • A quadratic effect of covariate \(x(s)\) (depth)
  • Stage 3 Hyperparameters

A non-spatial Model

  • Stage 1 Model for the response \[ y(s)|\eta(s)\sim \text{Tweedie}(p,\mu_i,\phi) \]

  • Stage 2 Latent field model\[ \eta(s) = \exp(\mu(s)) = \beta_0 + \beta_1 x(s) + \beta_2 x(s)^2 \]

  • Stage 3 Hyperparameters

    • dispersion parameter \(\phi\)
    • power parameter \(p\)

A non-spatial Model

INLA Model Results
Posterior summaries of fixed effects and hyperparameters
Parameter Mean 2.5% Quantile 97.5% Quantile
Intercept 3.866 3.690 4.043
depth −1.269 −1.480 −1.059
depth2 −1.077 −1.256 −0.898
p parameter for Tweedie 1.643 1.617 1.668
Dispersion parameter for Tweedie 3.804 3.530 4.093
  • \(\beta\) coef suggest log-biomass peaks at an intermediate depth within the study range and decreases toward both shallower and deeper extremes.
  • \(\phi>1\) indicates overdispersion relative to the variance function. Potentially caused by Unobserved heterogeneity.
  • Tweedie models fitted to biomass usually have convergence issues when there are large spatial areas with many zeros.
  • Is there a better approach?

A multilikelihood Hurdle Geostatistical Model

  • Stage 1 Model for the response(s) \[ \begin{aligned} y_i|\eta^{(1)}_i&\sim \text{Binomial}(1,\pi_i)\\ \log(z_i)|\eta^{(2)}_i&\sim \text{Normal}(\mu_i,\tau_e^{-1})\\ \end{aligned} \]

    • We then define a likelihood for each outcome.

      • \(y_i =\begin{cases} 1 &\text{if fishes have been caught at location } \mathbf{s}_i \\ 0 &\text{otherwise}\end{cases}\)

      • \(z_i =\begin{cases} NA &\text{if no fish were caught at location } \mathbf{s}_i \\ \text{biomass density at location } \mathbf{s}_i &\text{otherwise}\end{cases}\)

  • Stage 2 Latent field model \[ \begin{aligned} \eta^{(1)}_i &= \text{logit}(\pi_i) = X'\beta + \xi_i\\ \eta^{(2)}_i &= \mu_i = X'\alpha + \omega_i \end{aligned} \]

  • Stage 3 Hyperparameters

A multilikelihood Hurdle Geostatistical Model

  • Stage 1 Model for the response(s)] \[ \begin{aligned} y_i|\eta^{(1)}_i&\sim \text{Binomial}(1,\pi_i)\\ \log(z_i)|\eta^{(2)}_i&\sim \text{Normal}(\mu_i,\tau_e^{-1})\\ \end{aligned} \]
  • Stage 2 Latent field model \[ \begin{aligned} \eta^{(1)}_i &= \text{logit}(\pi_i) = X'\beta + \xi_i\\ \eta^{(2)}_i &= \mu_i = X'\alpha + \omega_i \end{aligned} \]
    • \(\{\alpha,\beta\}\) = Intercepts + covariate effects.
    • \(\{\xi,\omega\}\) = are the Gaussian fields with Matérn covariance (separate for each outcome).
  • Stage 3 Hyperparameters

A multilikelihood Hurdle Geostatistical Model

  • Stage 1 Model for the response(s) \[ \begin{aligned} y_i|\eta^{(1)}_i&\sim \text{Binomial}(1,\pi_i)\\ \log(z_i)|\eta^{(2)}_i&\sim \text{Normal}(\mu_i,\tau_e^{-1})\\ \end{aligned} \]

  • Stage 2 Latent field model \[ \begin{aligned} \eta^{(1)}_i &= \text{logit}(\pi_i) = X'\beta + \xi_i\\ \eta^{(2)}_i &= \mu_i = X'\alpha + \omega_i \end{aligned} \]

  • Stage 3 Hyperparameters

    • observational error (nugget) \(\tau_e\)
    • Matérn field(s) parameters \(\{\rho^{(1)},\rho^{(2)},\tau_{d}^{(1)},\tau_{d}^{(2)}\}\)

The mesh and SPDE representation

Hurdle Model Results

Parameter Mean 2.5% Quantile 97.5% Quantile
\[\alpha_0\] 3.498 2.885 4.227
\[\alpha_1\] −0.509 −1.098 0.059
\[\alpha_2\] −0.339 −0.789 0.106
\[\beta_0\] 1.761 −3.215 7.267
\[\beta_1\] −2.564 −3.626 −1.694
\[\beta_2\] −1.495 −2.143 −0.944
\[\tau_\epsilon^2\] 1.152 0.464 2.575
\[\rho^{[1]}\] 36.647 6.874 103.820
\[\tau_{d,1}\] 1.001 0.571 1.658
\[\rho^{[2]}\] 158.333 57.011 398.763
\[\tau_{d,2}\] 2.201 1.192 3.768
  • \(\alpha_0\) is the baseline catching probability on the logit scale
  • \(\beta_0\) is the predicted log(biomass density at the average depth (since these have been scaled)
  • Coefficients \(\alpha_1,\alpha_2\) refer to the change in the log-odds of catching fish as we increase 1 depth unit and unit\(^2\) respectively.
  • Coefficients \(\beta_1, \beta_2\) indicate that the log-biomass decreases with depth.
  • \(\rho^{[1]},\rho^{[2]}\), suggest spatial correlation decays at 36.65 and 158.33 Km respectively (the extension of the study is approx 46,000 km\(^2\))
  • unstructured variability is given by \(\tau^{-1}_e\) while ,\(\{\tau_{\delta,1}^{-1},\tau^{-1}_{d,2}\}\) represent the spatially structured variability.

Model comparison

Note that in the hurdle model there is no direct link between the parameters of the two observation parts.

  • the two likelihoods could share some of the components; for example the Matérn field could be used for both predictors.

  • What does the previous results suggest in terms of the estimated covariance parameters for the two fields? is it sensible to share the same component between the two parts?

  • We will fit a model that estimates this field jointly and compare it with our two previous models

The model being fitted is now:

\[ \begin{aligned} y_i|\eta^{(1)}_i&\sim \text{Binomial}(1,\pi_i)\\ \eta^{(1)}_i &= \text{logit}(\pi_i) = X'\beta + \color{red}{\xi_i}\\ \log(z_i)|\eta^{(2)}_i&\sim \text{Normal}(\mu_i,\tau_e^{-1})\\ \eta^{(2)}_i &= \mu_i = X'\alpha + \color{red}{\xi_i} \end{aligned} \]

Model comparison

Note that in the hurdle model there is no direct link between the parameters of the two observation parts.

  • the two likelihoods could share some of the components; for example the Matérn field could be used for both predictors.

  • What does the previous results suggest in terms of the estimated covariance parameters for the two fields? is it sensible to share the same component between the two parts?

  • We will fit a model that estimates this field jointly and compare it with our two previous models

Model DIC WAIC MLIK
Tweedie 1,599.690 1,612.043 −979.532
Hurdle 1,185.793 1,200.690 −660.936
Hurdle 2 1,227.128 1,227.340 −666.295

Spatial predictions

We need to compute:

  • \(\pi(s)\) = Catching probability

  • \(\mathbb{E}[Z(s)|Y(s)] = \exp\left(\mu(s) + \dfrac{1}{2\tau_{e}}\right)\)

  • \(\mathbb{E}(Z(s)) =\pi(s)\times \mathbb{E}[Z(s)|Y(s)]\)